-
Notifications
You must be signed in to change notification settings - Fork 2
runtime/pprof: add debug=26257 goroutine profile with labels and reduced STW #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new goroutine profile mode (debug=26257) that provides detailed goroutine information including labels while using a concurrent collection approach to reduce stop-the-world latency compared to debug=2. The implementation leverages the same underlying mechanism as debug=1 for concurrent stack collection while maintaining compatibility with the debug=2 output format.
- Adds a new debug mode (26257) that includes pprof labels in goroutine profiles with reduced STW time
- Implements concurrent goroutine profile collection with individual goroutine metadata tracking
- Provides comprehensive test coverage including latency benchmarks demonstrating significant performance improvements
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/runtime/pprof/pprof.go | Adds routing for new debug mode and implements stack formatting with labels |
| src/runtime/mprof.go | Extends concurrent profile collection to support goroutine metadata capture |
| src/internal/profilerecord/profilerecord.go | Defines new GoroutineInfo struct for metadata storage |
| src/runtime/pprof/pprof_test.go | Adds comprehensive tests and benchmarks for the new functionality |
| for i, r := range records { | ||
| goroutineID := infos[i].ID | ||
| state := "unknown" | ||
| if i < len(infos) && infos[i].State != "" { |
Copilot
AI
Aug 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition 'i < len(infos)' is redundant since the function parameter ensures 'infos' has the same length as 'records'. This check can be simplified to just 'infos[i].State != ""'.
| if i < len(infos) && infos[i].State != "" { | |
| if infos[i].State != "" { |
|
|
||
| // Get label information using the same format as debug=1. | ||
| var labelStr string | ||
| if i < len(labels) && labels[i] != nil { |
Copilot
AI
Aug 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the infos check, 'i < len(labels)' is redundant since labels has the same length as records. This can be simplified to 'labels[i] != nil'.
| if i < len(labels) && labels[i] != nil { | |
| if labels[i] != nil { |
| } | ||
|
|
||
| // Print "created by" information if available (skip main goroutine) | ||
| if i < len(infos) && goroutineID != 1 && infos[i].CreationPC != 0 { |
Copilot
AI
Aug 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition 'i < len(infos)' is redundant as infos has the same length as records. This can be simplified to 'goroutineID != 1 && infos[i].CreationPC != 0'.
| if i < len(infos) && goroutineID != 1 && infos[i].CreationPC != 0 { | |
| if goroutineID != 1 && infos[i].CreationPC != 0 { |
…ced STW
This adds a new goroutine profile mode (debug=26257) that emits one
entry per goroutine in a format nearly identical to debug=2 that
additionally includes any pprof labels in the header, following the [state]
segment, e.g:
goroutine 123 [select, 2 minutes]{"svc": "users"}:
The implementation of this mode is however significantly different from
debug=2, and instead based on the same underlying mechanism as debug=1.
Unlike the collection of debug=2 profiles, which walks all stacks while
the world is stopped, the implementation of this mode uses the
concurrent collection integrated with the scheduler that backs debug=1.
As a result, this new mode reduces the duration of the stop-the-world
phase, particularly when profiling processes with many goroutines,
compared to debug=2, and demonstrated by the included benchmark:
│ debug=2 │ debug=26257
│ max_latency_ns │ max_latency_ns vs base
goroutines=100x3-14 1013.17k ± 47% 84.06k ± 27% -91.70% (p=0.002 n=6)
goroutines=100x10-14 769.23k ± 7% 80.29k ± 22% -89.56% (p=0.002 n=6)
goroutines=100x50-14 2172.4k ± 9% 181.8k ± 46% -91.63% (p=0.002 n=6)
goroutines=1000x3-14 7133.9k ± 3% 195.7k ± 42% -97.26% (p=0.002 n=6)
goroutines=1000x10-14 11787.6k ± 48% 494.4k ± 77% -95.81% (p=0.002 n=6)
goroutines=1000x50-14 20234.0k ± 87% 174.8k ± 137% -99.14% (p=0.002 n=6)
goroutines=10000x3-14 68611.0k ± 49% 168.5k ± 2768% -99.75% (p=0.002 n=6)
goroutines=10000x10-14 60.261M ± 95% 3.460M ± 166% -94.26% (p=0.002 n=6)
goroutines=10000x50-14 284.144M ± 40% 4.672M ± 89% -98.36% (p=0.002 n=6)
goroutines=25000x3-14 171.290M ± 48% 4.287M ± 394% -97.50% (p=0.002 n=6)
goroutines=25000x10-14 150.827M ± 92% 6.424M ± 158% -95.74% (p=0.002 n=6)
goroutines=25000x50-14 708.238M ± 34% 2.249M ± 410% -99.68% (p=0.002 n=6)
geomean 25.08M 624.2k -97.51%
This concurrent collection approach and its relaxed consistency compared
to keeping the world stopped doesm mean the behavior of this new mode is
not exactly identical to debug=2. Additionally, currently this mode
always elides argument values when printing stacks, whereas debug=2
includes them most of the time. These behavior differences mean that
despite the performance benefits, the new implementation is not used for
the existing debug=2 mode, but rather only for the new mode making it an
opt-in alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on the Go runtime, so take that with a grain of salt, but the approach looks good to me. What I don't understand is what the plan would be for this after this merges. Ideally, we can find a team that "nominally" owns this in case it ever causes issues (Obs? DB Server?). We would also need follow-up changes in CRDB to actually cash in on this functionality, and again, there should be some ownership there.
| goroutineProfile.records = p | ||
| goroutineProfile.labels = labels | ||
| goroutineProfile.infos = infos | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you need to also update doRecordGoroutineProfile so that it updates goroutineProfile.infos?
|
|
||
| // printGoroutineStacksWithLabels formats goroutine records in a format similar | ||
| // to runtime.Stack, but also includes labels as well. | ||
| func printGoroutineStacksWithLabels( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a smaller diff that takes the existing print code and adds an argument for it?
| func writeGoroutineStacksWithLabels(w io.Writer) error { | ||
| n, ok := pprof_goroutineStacksWithLabels(nil, nil, nil) | ||
|
|
||
| for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you confident in the robustness of this loop? Maybe worth to additionally put in a growth factor of (say) 2 on each iteration.
| } | ||
| } | ||
|
|
||
| fmt.Fprintf(w, "goroutine %d [%s%s]%s:\n", goroutineID, state, blockedStr, labelStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one side-effect of the format change is that panicparse and friends won't accept this any more. Can you think of a way to squeeze the labels in in a way that doesn't "nominally" change the format? I can't - but thought I'd ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we wanted to use backtrace/debug=2 format we could inject them inside the []; I previously had assumed that was strictly state[, waitSince] but looking at the 1.25 changes that add bubble info there, and the scanning states, I realized that it is expected that any number of comma separated fields can be in the [] so we could put labels there. But we can never get the args in a resume-the-world safe way to I think maybe trying to make debug=2 without STW is a fool's errand and we should focus on getting what we want (state, creator, waitSince) in debug=0/1, ala golang#74954
|
I've been convinced on my upstream proposal to pursue only adding a binary pprof format debug=3, that injects synthesized, per-goroutine labels with the extra runtime info. The collection is more or less identical to what I have here but the printing is different. |
|
I made https://davidt.io/stackgazer support binary debug=0 profiles yesterday so I'm going to rework this branch to follow the suggestion on the upstream proposal to stick to only extending the binary format instead of making new text one. The core collection logic stays the same but the format function is pretty different. One question is how much I want to mess with |
This adds a new goroutine profile mode (debug=26257) that emits one entry per goroutine in a format identical to debug=2 except that additionally includes any pprof labels in the header, following the [state] segment, e.g:
The implementation of this mode is however significantly different from debug=2, and instead based on the same underlying mechanism as debug=1. Unlike the collection of debug=2 profiles, which walks all stacks while the world is stopped, the implementation of this mode uses the concurrent collection integrated with the scheduler that backs debug=1. As a result, this new mode reduces the duration of the stop-the-world phase, particularly when profiling processes with many goroutines, compared to debug=2, and demonstrated by the included benchmark:
This concurrent collection approach and its relaxed consistency compared to keeping the world stopped doesm mean the behavior of this new mode is not exactly identical to debug=2. Thus despite the performance benefits, the new implementation is not used for the existing debug=2 mode, but rather only for the new mode making it an opt-in alternative.